Multi-class Protein Classification Using Adaptive Codes
نویسندگان
چکیده
Predicting a protein’s structural class from its amino acid sequence is a fundamental problem in computational biology. Recent machine learning work in this domain has focused on developing new input space representations for protein sequences, that is, string kernels, some of which give state-of-the-art performance for the binary prediction task of discriminating between one class and all the others. However, the underlying protein classification problem is in fact a huge multiclass problem, with over 1000 protein folds and even more structural subcategories organized into a hierarchy. To handle this challenging many-class problem while taking advantage of progress on the binary problem, we introduce an adaptive code approach in the output space of one-vsthe-rest prediction scores. Specifically, we use a ranking perceptron algorithm to learn a weighting of binary classifiers that improves multi-class prediction with respect to a fixed set of output codes. We use a cross-validation set-up to generate output vectors for training, and we define codes that capture information about the protein structural hierarchy. Our code weighting approach significantly improves on the standard one-vs-all method for two difficult multi-class protein classification problems: remote homology detection and fold recognition. Our algorithm also outperforms a previous code learning approach due to Crammer and Singer, trained here using a perceptron, when the dimension of the code vectors is high and the number of classes is large. Finally, we compare against PSI-BLAST, one of the most widely used methods in protein sequence analysis, and find that our method strongly outperforms it on every structure clas∗. The first two authors contributed equally to this work. c ©2007 Iain Melvin, Eugene Ie, Jason Weston, William Stafford Noble and Christina Leslie. MELVIN, IE, WESTON, NOBLE AND LESLIE sification problem that we consider. Supplementary data and source code are available at http: //www.cs.columbia.edu/compbio/adaptive.
منابع مشابه
Adaptive Leader-Following and Leaderless Consensus of a Class of Nonlinear Systems Using Neural Networks
This paper deals with leader-following and leaderless consensus problems of high-order multi-input/multi-output (MIMO) multi-agent systems with unknown nonlinear dynamics in the presence of uncertain external disturbances. The agents may have different dynamics and communicate together under a directed graph. A distributed adaptive method is designed for both cases. The structures of the contro...
متن کاملFeature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملMULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM
Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...
متن کاملAdaptive Consensus Control for a Class of Non-affine MIMO Strict-Feedback Multi-Agent Systems with Time Delay
In this paper, the design of a distributed adaptive controller for a class of unknown non-affine MIMO strict-feedback multi agent systems with time delay has been performed under a directed graph. The controller design is based on dynamic surface control method. In the design process, radial basis function neural networks (RBFNNs) were employed to approximate the unknown nonlinear functions. S...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007